options(knitr.duplicate.label = 'allow')
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Chapter 5 - Dimensionality Reduction techniques

1.Here are some correlations between the variables:

human <- read.table("http://s3.amazonaws.com/assets.datacamp.com/production/course_2218/datasets/human2.txt", sep= ",", header=TRUE, row.names = 1)
library(GGally)
## Loading required package: ggplot2
library(corrplot)
## corrplot 0.84 loaded
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:GGally':
## 
##     nasa
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)

ggpairs(human)

cor(human)
##                Edu2.FM      Labo.FM     Edu.Exp   Life.Exp         GNI
## Edu2.FM    1.000000000  0.009564039  0.59325156  0.5760299  0.43030485
## Labo.FM    0.009564039  1.000000000  0.04732183 -0.1400125 -0.02173971
## Edu.Exp    0.593251562  0.047321827  1.00000000  0.7894392  0.62433940
## Life.Exp   0.576029853 -0.140012504  0.78943917  1.0000000  0.62666411
## GNI        0.430304846 -0.021739705  0.62433940  0.6266641  1.00000000
## Mat.Mor   -0.660931770  0.240461075 -0.73570257 -0.8571684 -0.49516234
## Ado.Birth -0.529418415  0.120158862 -0.70356489 -0.7291774 -0.55656208
## Parli.F    0.078635285  0.250232608  0.20608156  0.1700863  0.08920818
##              Mat.Mor  Ado.Birth     Parli.F
## Edu2.FM   -0.6609318 -0.5294184  0.07863528
## Labo.FM    0.2404611  0.1201589  0.25023261
## Edu.Exp   -0.7357026 -0.7035649  0.20608156
## Life.Exp  -0.8571684 -0.7291774  0.17008631
## GNI       -0.4951623 -0.5565621  0.08920818
## Mat.Mor    1.0000000  0.7586615 -0.08944000
## Ado.Birth  0.7586615  1.0000000 -0.07087810
## Parli.F   -0.0894400 -0.0708781  1.00000000
dim(human)
## [1] 155   8
str(human)
## 'data.frame':    155 obs. of  8 variables:
##  $ Edu2.FM  : num  1.007 0.997 0.983 0.989 0.969 ...
##  $ Labo.FM  : num  0.891 0.819 0.825 0.884 0.829 ...
##  $ Edu.Exp  : num  17.5 20.2 15.8 18.7 17.9 16.5 18.6 16.5 15.9 19.2 ...
##  $ Life.Exp : num  81.6 82.4 83 80.2 81.6 80.9 80.9 79.1 82 81.8 ...
##  $ GNI      : int  64992 42261 56431 44025 45435 43919 39568 52947 42155 32689 ...
##  $ Mat.Mor  : int  4 6 6 5 6 7 9 28 11 8 ...
##  $ Ado.Birth: num  7.8 12.1 1.9 5.1 6.2 3.8 8.2 31 14.5 25.3 ...
##  $ Parli.F  : num  39.6 30.5 28.5 38 36.9 36.9 19.9 19.4 28.2 31.4 ...
colnames(human)
## [1] "Edu2.FM"   "Labo.FM"   "Edu.Exp"   "Life.Exp"  "GNI"       "Mat.Mor"  
## [7] "Ado.Birth" "Parli.F"
head(human)
##               Edu2.FM   Labo.FM Edu.Exp Life.Exp   GNI Mat.Mor Ado.Birth
## Norway      1.0072389 0.8908297    17.5     81.6 64992       4       7.8
## Australia   0.9968288 0.8189415    20.2     82.4 42261       6      12.1
## Switzerland 0.9834369 0.8251001    15.8     83.0 56431       6       1.9
## Denmark     0.9886128 0.8840361    18.7     80.2 44025       5       5.1
## Netherlands 0.9690608 0.8286119    17.9     81.6 45435       6       6.2
## Germany     0.9927835 0.8072289    16.5     80.9 43919       7       3.8
##             Parli.F
## Norway         39.6
## Australia      30.5
## Switzerland    28.5
## Denmark        38.0
## Netherlands    36.9
## Germany        36.9

The dataset created and used in this exercise is composed of eight variables and 155 observations. Out of the included variables, “GNI” and “Mat.Mor” are integer variables and the other variables are all numerical. In the following table, the information stored (by variables) is shown and elaborated

Variable - Explanation

Labo.FM - ratio of females and males in the labour force Edu.Exp - expected years of schooling Life.Exp - life expectancy at birth GNI - gross national income per capita Mat.Mor - maternal mortality ratio Ado.Birth - adolescent birth rate Parli.F - percentage of female representatives in parliament

options(knitr.duplicate.label = 'allow', debug = TRUE)
library(pander)
## 
## Attaching package: 'pander'
## The following object is masked from 'package:GGally':
## 
##     wrap
pandoc.table(summary(human), caption = "Summary of Human data", split.table = 80)
## 
## -----------------------------------------------------------------
##     Edu2.FM          Labo.FM          Edu.Exp        Life.Exp    
## ---------------- ---------------- --------------- ---------------
##  Min.  :0.1717    Min.  :0.1857    Min.  : 5.40    Min.  :49.00  
## 
##  1st Qu.:0.7264   1st Qu.:0.5984   1st Qu.:11.25   1st Qu.:66.30 
## 
##  Median :0.9375   Median :0.7535   Median :13.50   Median :74.20 
## 
##   Mean :0.8529     Mean :0.7074     Mean :13.18     Mean :71.65  
## 
##  3rd Qu.:0.9968   3rd Qu.:0.8535   3rd Qu.:15.20   3rd Qu.:77.25 
## 
##  Max.  :1.4967    Max.  :1.0380    Max.  :20.20    Max.  :83.50  
## -----------------------------------------------------------------
## 
## Table: Summary of Human data (continued below)
## 
##  
## ------------------------------------------------------------------
##       GNI            Mat.Mor         Ado.Birth         Parli.F    
## ---------------- ---------------- ---------------- ---------------
##   Min.  : 581      Min.  : 1.0      Min.  : 0.60    Min.  : 0.00  
## 
##  1st Qu.: 4198    1st Qu.: 11.5    1st Qu.: 12.65   1st Qu.:12.40 
## 
##  Median : 12040   Median : 49.0    Median : 33.60   Median :19.30 
## 
##   Mean : 17628     Mean : 149.1     Mean : 47.16     Mean :20.91  
## 
##  3rd Qu.: 24512   3rd Qu.: 190.0   3rd Qu.: 71.95   3rd Qu.:27.95 
## 
##  Max.  :123124    Max.  :1100.0    Max.  :204.80    Max.  :57.50  
## ------------------------------------------------------------------
ggpairs(human, mapping = aes(alpha = 0.3), lower = list(combo = wrap("facethist")))

The summary shows interesting observations on the variables. The adoloscent birth rate (Ado.Birth) is positively correlated (0.759) with maternal mortality ratio but negatively correlated (-0.857) with life expectancy at birth (Life.Exp). Similarly, ratio of females and males with secondary education (Edu2.FM) and expected years of schooling (Edu.Exp) are both positively correlated with life expectancy at birth (Life.Exp). On the other hand, there is very little correlation between the ratio of females and males in labour force (Labo.FM) with “Edu.Exp” and “GNI”.

PCA analysis and a biplot (in a couple different ways)

biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c(“grey40”, “deeppink2”))

In the following section, we will summarize the principal components and make a principal component analysis (PCA) plot. First, PCA is done on non-standardized data followed up by standardized data.

pca_human<-prcomp(human)
biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c("grey40", "deeppink2"))
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

sum_pca_human<-summary(pca_human)
sum_pca_human
## Importance of components:
##                              PC1      PC2   PC3   PC4   PC5   PC6    PC7
## Standard deviation     1.854e+04 185.5219 25.19 11.45 3.766 1.566 0.1912
## Proportion of Variance 9.999e-01   0.0001  0.00  0.00 0.000 0.000 0.0000
## Cumulative Proportion  9.999e-01   1.0000  1.00  1.00 1.000 1.000 1.0000
##                           PC8
## Standard deviation     0.1591
## Proportion of Variance 0.0000
## Cumulative Proportion  1.0000
sum_pca_human_var<-sum_pca_human$sdev^2
sum_pca_human_var
## [1] 3.438860e+08 3.441836e+04 6.343853e+02 1.312035e+02 1.418457e+01
## [6] 2.452081e+00 3.655943e-02 2.531638e-02
pca_pr <- round(100*sum_pca_human$importance[2, ], digits = 1)
pc_lab<-paste0(names(pca_pr), " (", pca_pr, "%)")
biplot(pca_human, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = pc_lab[1], ylab = pc_lab[2], main = "PCA plot of non-scaled human data")
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

#biplot(pca_human, choices = 1:2, cex = c(1, 1), col = c("grey40", "deeppink2"),sub = "PC1 & PC2 with non-standardised dataset")

The PCA biplot above does not provide a meaningful insight to the data as it shows that a single variable, “GNI” has a dominant impact and greater weight. Moreover, “GNI” has a larger variance compared to other variables.

Next, we will scale the variables in the human data and compute principal components and plot the results.

human_std <- scale(human)
pca_human_std <- prcomp(human_std)
biplot(pca_human, choices = 1:2, cex=c(0.8,1), col=c("grey40", "deeppink2"))
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

pca_human_s<-prcomp(human, scale. = TRUE)
sum_pca_human_s<-summary(pca_human_s)
pca_pr_s <- round(100*sum_pca_human_s$importance[2, ], digits = 1)
pc_lab<-paste0(names(pca_pr_s), " (", pca_pr_s, "%)")
sum_pca_human_var_s<-sum_pca_human_s$sdev^2
sum_pca_human_var_s
## [1] 4.2883701 1.2989625 0.7657100 0.6066276 0.4381862 0.2876242 0.2106805
## [8] 0.1038390
biplot(pca_human_s, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = pc_lab[1], ylab = pc_lab[2], main = "PCA plot of scaled human data")

Here, after standardization, we can see that the plots look different and thus the results are different. The results are different after scaling because PCA is more sensitive and informative when the original features are scaled. Also, PCA assumes that features with larger variances are more important that those with smaller variances. In the non-scaled pca plot, we observed that the variables with higher values have a bigger influence as is the case with the “GNI” variable. After scaling the data, the variance between the variables is more reasonable. The first principal component (PC1) explains 53% of the variation compared to the 100% from when the data was not scaled.

Interpreting the two principal component dimensions: (1). Correlations between variables: The smaller angle between the arrows explains the greater correlation between the variables. With this assumption in mind, we can see that four of the variables, “Edu.Exp”, “Life.Exp”, “GNU” and “EDU.FM” are correlated. Out of those, “GNU” and “EDU2.FM” have the highest correlation as explained by the arrows and the angles formed by the arrows. In the same way, the variables “Parli.F” and “Labo.FM” are also correlated as are the variables “Mat.Mor” and “Ado.Birth”. In addition, the plot shows that the variables “Life.Exp” and “Ado.Birth” are the least correlated as they are furthest in the plot (indicated by the large angle between these two variables).

(2). Correlation between variables and Principal components: It is assumed that the smaller the angle between the variables and principal components, the more positively correlated the variable is. In light of the assumption, the variables “Parli.F” and “Labo.FM” are positively correlated to PC1 (i.e they are contributing the direction of PC1) whereas other variables are positively correlated to PC2 and thus directing the arrows towards PC2. Also, for PC2, “Life.Exp”, “Edu2.FM”, “GNU” and “Ado.FM” have higher weights than other variables.

We will use tea data from the FactoMineR package to practice multiple correspondence analysis (MCA). In this data, there are 300 observations and 36 variables.

library(FactoMineR)
data("tea")
str(tea)
## 'data.frame':    300 obs. of  36 variables:
##  $ breakfast       : Factor w/ 2 levels "breakfast","Not.breakfast": 1 1 2 2 1 2 1 2 1 1 ...
##  $ tea.time        : Factor w/ 2 levels "Not.tea time",..: 1 1 2 1 1 1 2 2 2 1 ...
##  $ evening         : Factor w/ 2 levels "evening","Not.evening": 2 2 1 2 1 2 2 1 2 1 ...
##  $ lunch           : Factor w/ 2 levels "lunch","Not.lunch": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dinner          : Factor w/ 2 levels "dinner","Not.dinner": 2 2 1 1 2 1 2 2 2 2 ...
##  $ always          : Factor w/ 2 levels "always","Not.always": 2 2 2 2 1 2 2 2 2 2 ...
##  $ home            : Factor w/ 2 levels "home","Not.home": 1 1 1 1 1 1 1 1 1 1 ...
##  $ work            : Factor w/ 2 levels "Not.work","work": 1 1 2 1 1 1 1 1 1 1 ...
##  $ tearoom         : Factor w/ 2 levels "Not.tearoom",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ friends         : Factor w/ 2 levels "friends","Not.friends": 2 2 1 2 2 2 1 2 2 2 ...
##  $ resto           : Factor w/ 2 levels "Not.resto","resto": 1 1 2 1 1 1 1 1 1 1 ...
##  $ pub             : Factor w/ 2 levels "Not.pub","pub": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Tea             : Factor w/ 3 levels "black","Earl Grey",..: 1 1 2 2 2 2 2 1 2 1 ...
##  $ How             : Factor w/ 4 levels "alone","lemon",..: 1 3 1 1 1 1 1 3 3 1 ...
##  $ sugar           : Factor w/ 2 levels "No.sugar","sugar": 2 1 1 2 1 1 1 1 1 1 ...
##  $ how             : Factor w/ 3 levels "tea bag","tea bag+unpackaged",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ where           : Factor w/ 3 levels "chain store",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ price           : Factor w/ 6 levels "p_branded","p_cheap",..: 4 6 6 6 6 3 6 6 5 5 ...
##  $ age             : int  39 45 47 23 48 21 37 36 40 37 ...
##  $ sex             : Factor w/ 2 levels "F","M": 2 1 1 2 2 2 2 1 2 2 ...
##  $ SPC             : Factor w/ 7 levels "employee","middle",..: 2 2 4 6 1 6 5 2 5 5 ...
##  $ Sport           : Factor w/ 2 levels "Not.sportsman",..: 2 2 2 1 2 2 2 2 2 1 ...
##  $ age_Q           : Factor w/ 5 levels "15-24","25-34",..: 3 4 4 1 4 1 3 3 3 3 ...
##  $ frequency       : Factor w/ 4 levels "1/day","1 to 2/week",..: 1 1 3 1 3 1 4 2 3 3 ...
##  $ escape.exoticism: Factor w/ 2 levels "escape-exoticism",..: 2 1 2 1 1 2 2 2 2 2 ...
##  $ spirituality    : Factor w/ 2 levels "Not.spirituality",..: 1 1 1 2 2 1 1 1 1 1 ...
##  $ healthy         : Factor w/ 2 levels "healthy","Not.healthy": 1 1 1 1 2 1 1 1 2 1 ...
##  $ diuretic        : Factor w/ 2 levels "diuretic","Not.diuretic": 2 1 1 2 1 2 2 2 2 1 ...
##  $ friendliness    : Factor w/ 2 levels "friendliness",..: 2 2 1 2 1 2 2 1 2 1 ...
##  $ iron.absorption : Factor w/ 2 levels "iron absorption",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ feminine        : Factor w/ 2 levels "feminine","Not.feminine": 2 2 2 2 2 2 2 1 2 2 ...
##  $ sophisticated   : Factor w/ 2 levels "Not.sophisticated",..: 1 1 1 2 1 1 1 2 2 1 ...
##  $ slimming        : Factor w/ 2 levels "No.slimming",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ exciting        : Factor w/ 2 levels "exciting","No.exciting": 2 1 2 2 2 2 2 2 2 2 ...
##  $ relaxing        : Factor w/ 2 levels "No.relaxing",..: 1 1 2 2 2 2 2 2 2 2 ...
##  $ effect.on.health: Factor w/ 2 levels "effect on health",..: 2 2 2 2 2 2 2 2 2 2 ...
dim(tea)
## [1] 300  36
summary(tea)
##          breakfast           tea.time          evening          lunch    
##  breakfast    :144   Not.tea time:131   evening    :103   lunch    : 44  
##  Not.breakfast:156   tea time    :169   Not.evening:197   Not.lunch:256  
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##         dinner           always          home           work    
##  dinner    : 21   always    :103   home    :291   Not.work:213  
##  Not.dinner:279   Not.always:197   Not.home:  9   work    : 87  
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##         tearoom           friends          resto          pub     
##  Not.tearoom:242   friends    :196   Not.resto:221   Not.pub:237  
##  tearoom    : 58   Not.friends:104   resto    : 79   pub    : 63  
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##         Tea         How           sugar                     how     
##  black    : 74   alone:195   No.sugar:155   tea bag           :170  
##  Earl Grey:193   lemon: 33   sugar   :145   tea bag+unpackaged: 94  
##  green    : 33   milk : 63                  unpackaged        : 36  
##                  other:  9                                          
##                                                                     
##                                                                     
##                                                                     
##                   where                 price          age        sex    
##  chain store         :192   p_branded      : 95   Min.   :15.00   F:178  
##  chain store+tea shop: 78   p_cheap        :  7   1st Qu.:23.00   M:122  
##  tea shop            : 30   p_private label: 21   Median :32.00          
##                             p_unknown      : 12   Mean   :37.05          
##                             p_upscale      : 53   3rd Qu.:48.00          
##                             p_variable     :112   Max.   :90.00          
##                                                                          
##            SPC               Sport       age_Q          frequency  
##  employee    :59   Not.sportsman:121   15-24:92   1/day      : 95  
##  middle      :40   sportsman    :179   25-34:69   1 to 2/week: 44  
##  non-worker  :64                       35-44:40   +2/day     :127  
##  other worker:20                       45-59:61   3 to 6/week: 34  
##  senior      :35                       +60  :38                    
##  student     :70                                                   
##  workman     :12                                                   
##              escape.exoticism           spirituality        healthy   
##  escape-exoticism    :142     Not.spirituality:206   healthy    :210  
##  Not.escape-exoticism:158     spirituality    : 94   Not.healthy: 90  
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##          diuretic             friendliness            iron.absorption
##  diuretic    :174   friendliness    :242   iron absorption    : 31   
##  Not.diuretic:126   Not.friendliness: 58   Not.iron absorption:269   
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##          feminine             sophisticated        slimming  
##  feminine    :129   Not.sophisticated: 85   No.slimming:255  
##  Not.feminine:171   sophisticated    :215   slimming   : 45  
##                                                              
##                                                              
##                                                              
##                                                              
##                                                              
##         exciting          relaxing              effect.on.health
##  exciting   :116   No.relaxing:113   effect on health   : 66    
##  No.exciting:184   relaxing   :187   No.effect on health:234    
##                                                                 
##                                                                 
##                                                                 
##                                                                 
## 
library(tidyr)
library(dplyr)
keep<- c("breakfast","tea.time","friends","frequency","Tea","sugar","sex","sophisticated")
my_tea <- dplyr::select(tea, one_of(keep))
gather(my_tea) %>% ggplot(aes(value)) + geom_bar() + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8)) + facet_wrap("key", scales = "free")
## Warning: attributes are not identical across measure variables;
## they will be dropped

mca_tea <- MCA(my_tea, graph=FALSE)
summary(mca_tea, nbelements=Inf, nbind=5)
## 
## Call:
## MCA(X = my_tea, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
## Variance               0.213   0.189   0.159   0.136   0.131   0.118
## % of var.             15.481  13.717  11.556   9.865   9.518   8.606
## Cumulative % of var.  15.481  29.198  40.754  50.619  60.137  68.743
##                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11
## Variance               0.112   0.093   0.091   0.072   0.061
## % of var.              8.150   6.766   6.644   5.254   4.444
## Cumulative % of var.  76.893  83.658  90.302  95.556 100.000
## 
## Individuals (the 5 first)
##                      Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## 1                 |  0.359  0.202  0.071 |  1.116  2.201  0.686 | -0.040
## 2                 | -0.198  0.061  0.023 |  0.845  1.261  0.419 |  0.349
## 3                 | -0.484  0.367  0.226 | -0.243  0.105  0.057 | -0.211
## 4                 |  0.779  0.951  0.499 |  0.345  0.210  0.098 | -0.071
## 5                 | -0.065  0.007  0.003 |  0.816  1.176  0.480 | -0.026
##                      ctr   cos2  
## 1                  0.003  0.001 |
## 2                  0.255  0.071 |
## 3                  0.094  0.043 |
## 4                  0.011  0.004 |
## 5                  0.001  0.000 |
## 
## Categories
##                       Dim.1     ctr    cos2  v.test     Dim.2     ctr
## breakfast         |  -0.545   8.384   0.275  -9.060 |   0.576  10.563
## Not.breakfast     |   0.503   7.739   0.275   9.060 |  -0.532   9.750
## Not.tea time      |   0.663  11.263   0.340  10.090 |   0.345   3.447
## tea time          |  -0.514   8.730   0.340 -10.090 |  -0.268   2.672
## friends           |  -0.115   0.504   0.025  -2.721 |  -0.375   6.083
## Not.friends       |   0.216   0.950   0.025   2.721 |   0.706  11.465
## 1/day             |   0.296   1.631   0.041   3.487 |   0.609   7.774
## 1 to 2/week       |   1.072   9.899   0.198   7.686 |  -1.161  13.109
## +2/day            |  -0.727  13.148   0.388 -10.775 |   0.105   0.308
## 3 to 6/week       |   0.502   1.674   0.032   3.100 |  -0.589   2.607
## black             |  -0.394   2.246   0.051  -3.896 |   0.301   1.477
## Earl Grey         |   0.030   0.034   0.002   0.701 |  -0.174   1.295
## green             |   0.707   3.224   0.062   4.295 |   0.345   0.869
## No.sugar          |  -0.467   6.621   0.233  -8.352 |  -0.031   0.033
## sugar             |   0.499   7.078   0.233   8.352 |   0.033   0.035
## F                 |  -0.443   6.832   0.286  -9.249 |  -0.357   5.014
## M                 |   0.646   9.969   0.286   9.249 |   0.521   7.315
## Not.sophisticated |  -0.056   0.052   0.001  -0.606 |   0.786  11.599
## sophisticated     |   0.022   0.020   0.001   0.606 |  -0.311   4.586
##                      cos2  v.test     Dim.3     ctr    cos2  v.test  
## breakfast           0.306   9.573 |  -0.244   2.256   0.055  -4.060 |
## Not.breakfast       0.306  -9.573 |   0.226   2.082   0.055   4.060 |
## Not.tea time        0.092   5.254 |   0.157   0.844   0.019   2.386 |
## tea time            0.092  -5.254 |  -0.121   0.654   0.019  -2.386 |
## friends             0.265  -8.898 |  -0.294   4.448   0.163  -6.983 |
## Not.friends         0.265   8.898 |   0.554   8.382   0.163   6.983 |
## 1/day               0.172   7.164 |  -0.206   1.058   0.020  -2.426 |
## 1 to 2/week         0.232  -8.325 |   0.110   0.139   0.002   0.787 |
## +2/day              0.008   1.552 |   0.021   0.015   0.000   0.312 |
## 3 to 6/week         0.044  -3.642 |   0.355   1.123   0.016   2.194 |
## black               0.030   2.974 |   0.821  13.085   0.221   8.125 |
## Earl Grey           0.055  -4.047 |  -0.535  14.485   0.516 -12.424 |
## green               0.015   2.098 |   1.287  14.344   0.205   7.827 |
## No.sugar            0.001  -0.552 |   0.568  13.095   0.344  10.148 |
## sugar               0.001   0.552 |  -0.607  13.998   0.344 -10.148 |
## F                   0.186  -7.458 |   0.027   0.035   0.001   0.569 |
## M                   0.186   7.458 |  -0.040   0.051   0.001  -0.569 |
## Not.sophisticated   0.244   8.545 |  -0.564   7.100   0.126  -6.136 |
## sophisticated       0.244  -8.545 |   0.223   2.807   0.126   6.136 |
## 
## Categorical variables (eta2)
##                     Dim.1 Dim.2 Dim.3  
## breakfast         | 0.275 0.306 0.055 |
## tea.time          | 0.340 0.092 0.019 |
## friends           | 0.025 0.265 0.163 |
## frequency         | 0.449 0.359 0.030 |
## Tea               | 0.094 0.055 0.533 |
## sugar             | 0.233 0.001 0.344 |
## sex               | 0.286 0.186 0.001 |
## sophisticated     | 0.001 0.244 0.126 |
plot(mca_tea, invisible = c("ind"), habillage = "quali", sub = "MCA of tea dataset")

In general, the MCA plot grouped the categories that are in a way, equivalent to each other, at least to some extent. I suppose it would be better to refer to them as, “similar cathegories” (both ways, as in: both are individually similar to the other one, so they share similarity to each other). Categories such as “tea time” and “friends” are grouped together and in the same way, so are the categories such as “Not friends”" and “Not.tea time”. In other words, friends tend to spend tea time together and those who do not have tea during other times (not tea times) are not close friends. The plot also indicates that females are more social than males because they have friends, and participate in tea time. It also indicates that females do not put sugar into tea, like males do.